Accelerating Workloads on FPGAs via OpenCL: A Case Study with OpenDwarfs

نویسندگان

  • Anshuman Verma
  • Ahmed E. Helal
  • Konstantinos Krommydas
  • Wu-Chun Feng
چکیده

For decades, the streaming architecture of FPGAs has delivered accelerated performance across many application domains, such as option pricing solvers in finance, computational fluid dynamics in oil and gas, and packet processing in network routers and firewalls. However, this performance comes at the expense of programmability. FPGA developers use hardware design languages (HDLs) to implement the application data and control path and to design hardware modules for computational pipelines, memory management, synchronization, and communication. This process requires extensive knowledge of logic design, design automation tools, and low-level details of FPGA architecture, this consumes significant development time and effort. To address this lack of programmability of FPGAs, OpenCL provides an easy-to-use and portable programming model for CPUs, GPUs, APUs, and now, FPGAs. Although this significantly improved programmability yet an optimized GPU implementation of kernel may lack performance portability for FPGA. To improve the performance of OpenCL kernels on FPGAs we identify general techniques to optimize OpenCL kernels for FPGAs under device-specific hardware constraints. We then apply these optimizations techniques to the OpenDwarfs benchmark suite, which has diverse parallelism profiles and memory access patterns, in order to evaluate the effectiveness of the optimizations in terms of performance and resource utilization. Finally, we present the performance of structured grids and N-body dwarf-based benchmarks in the context of various optimization along with their potential re-factoring. We find that careful design of kernels for FPGA can result in a highly efficient pipeline achieving 91% of theoretical throughput for the structured grids dwarf.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluation of ‘OpenCL for FPGA’ for Data Acquisition and Acceleration in High Energy Physics

The increase in the data acquisition and processing needs of High Energy Physics experiments has made it more essential to use FPGAs to meet those needs. However harnessing the capabilities of the FPGAs has been hard for anyone but expert FPGA developers. The arrival of OpenCL with the two major FPGA vendors supporting it, offers an easy software-based approach to taking advantage of FPGAs in a...

متن کامل

Benchmarking under the hood of OpenCL FPGA platforms

The end of Moore’s law creates a significant turning point for computer architecture. Today, performance is largely limited by energy, power, and cooling. Heterogeneity and radical new architecture designs are keys to achieving higher energy proportionality. In mobile computing, heterogeneity is well adopted in system-on-chip designs (e.g., to improve battery life). In high-performance computin...

متن کامل

OpenCL for FPGAs: Prototyping a Compiler

Hardware acceleration using FPGAs has shown orders of magnitude reduction in runtime of computationally-intensive applications in comparison to traditional stand-alone computers [1]. This is possible because on an FPGA many computations can be performed at the same time in a trulyparallel fashion. However, parallel computation at a hardware level requires a great deal of expertise, which limits...

متن کامل

OpenDwarfs: Characterization of Dwarf-Based Benchmarks on Fixed and Reconfigurable Architectures

The proliferation of heterogeneous computing platforms presents the parallel computing community with new challenges. One such challenge entails evaluating the efficacy of such parallel architectures and identifying the architectural innovations that ultimately benefit applications. To address this challenge, we need benchmarks that capture the execution patterns (i.e., dwarfs or motifs) of app...

متن کامل

Cooperative Kernels: GPU Multitasking for Blocking Algorithms (Extended Version)

There is growing interest in accelerating irregular data-parallel algorithms on GPUs. These algorithms are typically blocking, so they require fair scheduling. But GPU programming models (e.g. OpenCL) do not mandate fair scheduling, and GPU schedulers are unfair in practice. Current approaches avoid this issue by exploiting scheduling quirks of today’s GPUs in a manner that does not allow the G...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016